Cargando…
Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888513/ https://www.ncbi.nlm.nih.gov/pubmed/27245157 http://dx.doi.org/10.1186/s12859-016-1102-x |
_version_ | 1782434864704782336 |
---|---|
author | Prabh, Neel Rödelsperger, Christian |
author_facet | Prabh, Neel Rödelsperger, Christian |
author_sort | Prabh, Neel |
collection | PubMed |
description | BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs. RESULTS: Here, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes. CONCLUSIONS: Our study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes. |
format | Online Article Text |
id | pubmed-4888513 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-48885132016-06-08 Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? Prabh, Neel Rödelsperger, Christian BMC Bioinformatics Research Article BACKGROUND: Current genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs. RESULTS: Here, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes. CONCLUSIONS: Our study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes. BioMed Central 2016-05-31 /pmc/articles/PMC4888513/ /pubmed/27245157 http://dx.doi.org/10.1186/s12859-016-1102-x Text en © Prabh and Rödelsperger. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Prabh, Neel Rödelsperger, Christian Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? |
title | Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? |
title_full | Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? |
title_fullStr | Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? |
title_full_unstemmed | Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? |
title_short | Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs? |
title_sort | are orphan genes protein-coding, prediction artifacts, or non-coding rnas? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888513/ https://www.ncbi.nlm.nih.gov/pubmed/27245157 http://dx.doi.org/10.1186/s12859-016-1102-x |
work_keys_str_mv | AT prabhneel areorphangenesproteincodingpredictionartifactsornoncodingrnas AT rodelspergerchristian areorphangenesproteincodingpredictionartifactsornoncodingrnas |