Cargando…

Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?

BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs...

Descripción completa

Detalles Bibliográficos
Autores principales: Prabh, Neel, Rödelsperger, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888513/
https://www.ncbi.nlm.nih.gov/pubmed/27245157
http://dx.doi.org/10.1186/s12859-016-1102-x
_version_ 1782434864704782336
author Prabh, Neel
Rödelsperger, Christian
author_facet Prabh, Neel
Rödelsperger, Christian
author_sort Prabh, Neel
collection PubMed
description BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs. RESULTS: Here, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes. CONCLUSIONS: Our study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes.
format Online
Article
Text
id pubmed-4888513
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48885132016-06-08 Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? Prabh, Neel Rödelsperger, Christian BMC Bioinformatics Research Article BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs. RESULTS: Here, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes. CONCLUSIONS: Our study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes. BioMed Central 2016-05-31 /pmc/articles/PMC4888513/ /pubmed/27245157 http://dx.doi.org/10.1186/s12859-016-1102-x Text en © Prabh and Rödelsperger. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Prabh, Neel
Rödelsperger, Christian
Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
title Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
title_full Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
title_fullStr Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
title_full_unstemmed Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
title_short Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
title_sort are orphan genes protein-coding, prediction artifacts, or non-coding rnas?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888513/
https://www.ncbi.nlm.nih.gov/pubmed/27245157
http://dx.doi.org/10.1186/s12859-016-1102-x
work_keys_str_mv AT prabhneel areorphangenesproteincodingpredictionartifactsornoncodingrnas
AT rodelspergerchristian areorphangenesproteincodingpredictionartifactsornoncodingrnas