Cargando…

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (PΨgs) are made through mRNA retrotransposition. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Harrison, Paul M., Zheng, Deyou, Zhang, Zhaolei, Carriero, Nicholas, Gerstein, Mark
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1087782/
https://www.ncbi.nlm.nih.gov/pubmed/15860774
http://dx.doi.org/10.1093/nar/gki531
_version_ 1782123821099122688
author Harrison, Paul M.
Zheng, Deyou
Zhang, Zhaolei
Carriero, Nicholas
Gerstein, Mark
author_facet Harrison, Paul M.
Zheng, Deyou
Zhang, Zhaolei
Carriero, Nicholas
Gerstein, Mark
author_sort Harrison, Paul M.
collection PubMed
description Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (PΨgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human PΨgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPΨg), which is disabled but nonetheless transcribed. TPΨgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPΨg. To elucidate their role, we identified human TPΨgs by mapping expressed sequences onto PΨgs and, reciprocally, extracting TPΨgs from known mRNAs. We consider only those PΨgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166–233 TPΨgs (∼4–6% of PΨgs). Proteins/transcripts with the highest numbers of homologous TPΨgs generally have many homologous PΨgs and are abundantly expressed. TPΨgs are significantly over-represented near both the 5′ and 3′ ends of genes; this suggests that TPΨgs can be formed through gene–promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPΨgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPΨgs are unlike other PΨgs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPΨgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPΨgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPΨgs is available at: (or) .
format Text
id pubmed-1087782
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-10877822005-04-29 Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability Harrison, Paul M. Zheng, Deyou Zhang, Zhaolei Carriero, Nicholas Gerstein, Mark Nucleic Acids Res Article Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (PΨgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human PΨgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPΨg), which is disabled but nonetheless transcribed. TPΨgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPΨg. To elucidate their role, we identified human TPΨgs by mapping expressed sequences onto PΨgs and, reciprocally, extracting TPΨgs from known mRNAs. We consider only those PΨgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166–233 TPΨgs (∼4–6% of PΨgs). Proteins/transcripts with the highest numbers of homologous TPΨgs generally have many homologous PΨgs and are abundantly expressed. TPΨgs are significantly over-represented near both the 5′ and 3′ ends of genes; this suggests that TPΨgs can be formed through gene–promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPΨgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPΨgs are unlike other PΨgs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPΨgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPΨgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPΨgs is available at: (or) . Oxford University Press 2005 2005-04-28 /pmc/articles/PMC1087782/ /pubmed/15860774 http://dx.doi.org/10.1093/nar/gki531 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Harrison, Paul M.
Zheng, Deyou
Zhang, Zhaolei
Carriero, Nicholas
Gerstein, Mark
Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
title Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
title_full Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
title_fullStr Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
title_full_unstemmed Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
title_short Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
title_sort transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1087782/
https://www.ncbi.nlm.nih.gov/pubmed/15860774
http://dx.doi.org/10.1093/nar/gki531
work_keys_str_mv AT harrisonpaulm transcribedprocessedpseudogenesinthehumangenomeanintermediateformofexpressedretrosequencelackingproteincodingability
AT zhengdeyou transcribedprocessedpseudogenesinthehumangenomeanintermediateformofexpressedretrosequencelackingproteincodingability
AT zhangzhaolei transcribedprocessedpseudogenesinthehumangenomeanintermediateformofexpressedretrosequencelackingproteincodingability
AT carrieronicholas transcribedprocessedpseudogenesinthehumangenomeanintermediateformofexpressedretrosequencelackingproteincodingability
AT gersteinmark transcribedprocessedpseudogenesinthehumangenomeanintermediateformofexpressedretrosequencelackingproteincodingability