Cargando…
The Recent De Novo Origin of Protein C-Termini
Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494051/ https://www.ncbi.nlm.nih.gov/pubmed/26002864 http://dx.doi.org/10.1093/gbe/evv098 |
_version_ | 1782380021617262592 |
---|---|
author | Andreatta, Matthew E. Levine, Joshua A. Foy, Scott G. Guzman, Lynette D. Kosinski, Luke J. Cordes, Matthew H.J. Masel, Joanna |
author_facet | Andreatta, Matthew E. Levine, Joshua A. Foy, Scott G. Guzman, Lynette D. Kosinski, Luke J. Cordes, Matthew H.J. Masel, Joanna |
author_sort | Andreatta, Matthew E. |
collection | PubMed |
description | Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic. We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure. |
format | Online Article Text |
id | pubmed-4494051 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-44940512015-07-09 The Recent De Novo Origin of Protein C-Termini Andreatta, Matthew E. Levine, Joshua A. Foy, Scott G. Guzman, Lynette D. Kosinski, Luke J. Cordes, Matthew H.J. Masel, Joanna Genome Biol Evol Research Article Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic. We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure. Oxford University Press 2015-05-21 /pmc/articles/PMC4494051/ /pubmed/26002864 http://dx.doi.org/10.1093/gbe/evv098 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Andreatta, Matthew E. Levine, Joshua A. Foy, Scott G. Guzman, Lynette D. Kosinski, Luke J. Cordes, Matthew H.J. Masel, Joanna The Recent De Novo Origin of Protein C-Termini |
title | The Recent De Novo Origin of Protein C-Termini |
title_full | The Recent De Novo Origin of Protein C-Termini |
title_fullStr | The Recent De Novo Origin of Protein C-Termini |
title_full_unstemmed | The Recent De Novo Origin of Protein C-Termini |
title_short | The Recent De Novo Origin of Protein C-Termini |
title_sort | recent de novo origin of protein c-termini |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4494051/ https://www.ncbi.nlm.nih.gov/pubmed/26002864 http://dx.doi.org/10.1093/gbe/evv098 |
work_keys_str_mv | AT andreattamatthewe therecentdenovooriginofproteinctermini AT levinejoshuaa therecentdenovooriginofproteinctermini AT foyscottg therecentdenovooriginofproteinctermini AT guzmanlynetted therecentdenovooriginofproteinctermini AT kosinskilukej therecentdenovooriginofproteinctermini AT cordesmatthewhj therecentdenovooriginofproteinctermini AT maseljoanna therecentdenovooriginofproteinctermini AT andreattamatthewe recentdenovooriginofproteinctermini AT levinejoshuaa recentdenovooriginofproteinctermini AT foyscottg recentdenovooriginofproteinctermini AT guzmanlynetted recentdenovooriginofproteinctermini AT kosinskilukej recentdenovooriginofproteinctermini AT cordesmatthewhj recentdenovooriginofproteinctermini AT maseljoanna recentdenovooriginofproteinctermini |