Cargando…

Evaluating the protein coding potential of exonized transposable element sequences

BACKGROUND: Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE s...

Descripción completa

Detalles Bibliográficos
Autores principales: Piriyapongsa, Jittima, Rutledge, Mark T, Patel, Sanil, Borodovsky, Mark, Jordan, I King
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2203978/
https://www.ncbi.nlm.nih.gov/pubmed/18036258
http://dx.doi.org/10.1186/1745-6150-2-31
_version_ 1782148405596782592
author Piriyapongsa, Jittima
Rutledge, Mark T
Patel, Sanil
Borodovsky, Mark
Jordan, I King
author_facet Piriyapongsa, Jittima
Rutledge, Mark T
Patel, Sanil
Borodovsky, Mark
Jordan, I King
author_sort Piriyapongsa, Jittima
collection PubMed
description BACKGROUND: Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. RESULTS: We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. CONCLUSION: The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. REVIEWERS: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.).
format Text
id pubmed-2203978
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22039782008-01-17 Evaluating the protein coding potential of exonized transposable element sequences Piriyapongsa, Jittima Rutledge, Mark T Patel, Sanil Borodovsky, Mark Jordan, I King Biol Direct Research BACKGROUND: Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. RESULTS: We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. CONCLUSION: The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. REVIEWERS: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). BioMed Central 2007-11-26 /pmc/articles/PMC2203978/ /pubmed/18036258 http://dx.doi.org/10.1186/1745-6150-2-31 Text en Copyright © 2007 Piriyapongsa et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Piriyapongsa, Jittima
Rutledge, Mark T
Patel, Sanil
Borodovsky, Mark
Jordan, I King
Evaluating the protein coding potential of exonized transposable element sequences
title Evaluating the protein coding potential of exonized transposable element sequences
title_full Evaluating the protein coding potential of exonized transposable element sequences
title_fullStr Evaluating the protein coding potential of exonized transposable element sequences
title_full_unstemmed Evaluating the protein coding potential of exonized transposable element sequences
title_short Evaluating the protein coding potential of exonized transposable element sequences
title_sort evaluating the protein coding potential of exonized transposable element sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2203978/
https://www.ncbi.nlm.nih.gov/pubmed/18036258
http://dx.doi.org/10.1186/1745-6150-2-31
work_keys_str_mv AT piriyapongsajittima evaluatingtheproteincodingpotentialofexonizedtransposableelementsequences
AT rutledgemarkt evaluatingtheproteincodingpotentialofexonizedtransposableelementsequences
AT patelsanil evaluatingtheproteincodingpotentialofexonizedtransposableelementsequences
AT borodovskymark evaluatingtheproteincodingpotentialofexonizedtransposableelementsequences
AT jordaniking evaluatingtheproteincodingpotentialofexonizedtransposableelementsequences